1,180 research outputs found

    Consistency of cross validation for comparing regression procedures

    Full text link
    Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.Comment: Published in at http://dx.doi.org/10.1214/009053607000000514 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Maximum Lqq-likelihood estimation

    Full text link
    In this paper, the maximum Lqq-likelihood estimator (MLqqE), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30--35] is introduced. The properties of the MLqqE are studied via asymptotic analysis and computer simulations. The behavior of the MLqqE is characterized by the degree of distortion qq applied to the assumed model. When qq is properly chosen for small and moderate sample sizes, the MLqqE can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and qq tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of MLqqE is established.Comment: Published in at http://dx.doi.org/10.1214/09-AOS687 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sparsity Oriented Importance Learning for High-dimensional Linear Regression

    Full text link
    With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures

    Forecast Combination Under Heavy-Tailed Errors

    Full text link
    Forecast combination has been proven to be a very important technique to obtain accurate predictions. In many applications, forecast errors exhibit heavy tail behaviors for various reasons. Unfortunately, to our knowledge, little has been done to deal with forecast combination for such situations. The familiar forecast combination methods such as simple average, least squares regression, or those based on variance-covariance of the forecasts, may perform very poorly. In this paper, we propose two nonparametric forecast combination methods to address the problem. One is specially proposed for the situations that the forecast errors are strongly believed to have heavy tails that can be modeled by a scaled Student's t-distribution; the other is designed for relatively more general situations when there is a lack of strong or consistent evidence on the tail behaviors of the forecast errors due to shortage of data and/or evolving data generating process. Adaptive risk bounds of both methods are developed. Simulations and a real example show superior performance of the new methods
    • …
    corecore